Goto

Collaborating Authors

 target protein


Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration

Neural Information Processing Systems

In recent years, AI-assisted drug design methods have been proposed to generate molecules given the pockets' structures of target proteins. Most of them are {\em atom-level-based} methods, which consider atoms as basic components and generate atom positions and types. In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose \textsc{D3FG}, a {\em functional-group-based} diffusion model for pocket-specific molecule generation and elaboration.


Prior-Guided Flow Matching for Target-Aware Molecule Design with Learnable Atom Number

Zhou, Jingyuan, Qian, Hao, Tu, Shikui, Xu, Lei

arXiv.org Artificial Intelligence

Structure-based drug design (SBDD), aiming to generate 3D molecules with high binding affinity toward target proteins, is a vital approach in novel drug discovery. Although recent generative models have shown great potential, they suffer from unstable probability dynamics and mismatch between generated molecule size and the protein pockets geometry, resulting in inconsistent quality and off-target effects. We propose PAFlow, a novel target-aware molecular generation model featuring prior interaction guidance and a learnable atom number predictor. PAFlow adopts the efficient flow matching framework to model the generation process and constructs a new form of conditional flow matching for discrete atom types. A protein-ligand interaction predictor is incorporated to guide the vector field toward higher-affinity regions during generation, while an atom number predictor based on protein pocket information is designed to better align generated molecule size with target geometry. Extensive experiments on the CrossDocked2020 benchmark show that PAFlow achieves a new state-of-the-art in binding affinity (up to -8.31 Avg. Vina Score), simultaneously maintains favorable molecular properties.


PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Song, Zhenqiao, Li, Tiaoxiao, Li, Lei, Min, Martin Renqiang

arXiv.org Artificial Intelligence

Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing, remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiffbuilds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, k-nearest neighbor (kNN) equivariant graph layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein-protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBenchand finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiffconsistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively. The code, data and models are available at https://github.com/JocelynSong/PPDiff.



SOLD: SELFIES-based Objective-driven Latent Diffusion

Ho, Elbert

arXiv.org Artificial Intelligence

Recently, machine learning has made a significant impact on de novo drug design. However, current approaches to creating novel molecules conditioned on a target protein typically rely on generating molecules directly in the 3D conformational space, which are often slow and overly complex. In this work, we propose SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein. In the process, we also train an innovative SELFIES transformer and propose a new way to balance losses when training multi-task machine learning models.Our model generates high-affinity molecules for the target protein in a simple and efficient way, while also leaving room for future improvements through the addition of more data.


ExMolRL: Phenotype-Target Joint Generation of De Novo Molecules via Multi-Objective Reinforcement Learning

Guo, Haotian, Liu, Hui

arXiv.org Artificial Intelligence

The generation of high-quality candidate molecules remains a central challenge in AI-driven drug design. Current phenotype-based and target-based strategies each suffer limitations, either incurring high experimental costs or overlook system-level cellular responses. To bridge this gap, we propose ExMoIRL, a novel generative framework that synergistically integrates phenotypic and target-specific cues for de novo molecular generation. The phenotype-guided generator is first pretrained on expansive drug-induced transcriptional profiles and subsequently fine-tuned via multi-objective reinforcement learning (RL). Crucially, the reward function fuses docking affinity and drug-likeness scores, augmented with ranking loss, prior-likelihood regularization, and entropy maximization. The multi-objective RL steers the model toward chemotypes that are simultaneously potent, diverse, and aligned with the specified phenotypic effects. Extensive experiments demonstrate ExMoIRL's superior performance over state-of-the-art phenotype-based and target-based models across multiple well-characterized targets. Our generated molecules exhibit favorable drug-like properties, high target affinity, and inhibitory potency (IC50) against cancer cells. This unified framework showcases the synergistic potential of combining phenotype-guided and target-aware strategies, offering a more effective solution for de novo drug discovery.


Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient Integration

Choi, Seungyeon, Kim, Hwanhee, Park, Chihyun, Lee, Dahyeon, Lee, Seungyong, Kim, Yoonju, Park, Hyoungjoon, Kwon, Sein, Jo, Youngwan, Park, Sanghyun

arXiv.org Artificial Intelligence

Recent advances in Structure-based Drug Design (SBDD) have leveraged generative models for 3D molecular generation, predominantly evaluating model performance by binding affinity to target proteins. However, practical drug discovery necessitates high binding affinity along with synthetic feasibility and selectivity, critical properties that were largely neglected in previous evaluations. To address this gap, we identify fundamental limitations of conventional diffusion-based generative models in effectively guiding molecule generation toward these diverse pharmacological properties. We propose CByG, a novel framework extending Bayesian Flow Network into a gradient-based conditional generative model that robustly integrates property-specific guidance. Additionally, we introduce a comprehensive evaluation scheme incorporating practical benchmarks for binding affinity, synthetic feasibility, and selectivity, overcoming the limitations of conventional evaluation methods. Extensive experiments demonstrate that our proposed CByG framework significantly outperforms baseline models across multiple essential evaluation criteria, highlighting its effectiveness and practicality for real-world drug discovery applications.


Generative molecule evolution using 3D pharmacophore for efficient Structure-Based Drug Design

He, Yi, Wang, Ailun, Wang, Zhi, Liu, Yu, Xu, Xingyuan, Yan, Wen

arXiv.org Artificial Intelligence

Recent advances in generative models, particularly diffusion and auto-regressive models, have revolutionized fields like computer vision and natural language processing. However, their application to structure-based drug design (SBDD) remains limited due to critical data constraints. To address the limitation of training data for models targeting SBDD tasks, we propose an evolutionary framework named MEVO, which bridges the gap between billion-scale small molecule dataset and the scarce protein-ligand complex dataset, and effectively increase the abundance of training data for generative SBDD models. MEVO is composed of three key components: a high-fidelity VQ-VAE for molecule representation in latent space, a diffusion model for pharmacophore-guided molecule generation, and a pocket-aware evolutionary strategy for molecule optimization with physics-based scoring function. This framework efficiently generate high-affinity binders for various protein targets, validated with predicted binding affinities using free energy perturbation (FEP) methods. In addition, we showcase the capability of MEVO in designing potent inhibitors to KRAS$^{\textrm{G12D}}$, a challenging target in cancer therapeutics, with similar affinity to the known highly active inhibitor evaluated by FEP calculations. With high versatility and generalizability, MEVO offers an effective and data-efficient model for various tasks in structure-based ligand design.


A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design

Tuo, Hao, Li, Yan, Hu, Xuanning, Zhao, Haishi, Liu, Xueyan, Yang, Bo

arXiv.org Artificial Intelligence

Drug design, particularly in the discovery of lead compounds, is of core strategic importance to combating disease and enhancing human well-being. Prevailing computational methods, however, struggle to effectively integrate domain-specific knowledge, severely limiting their capacity to identify novel lead compounds with validated binding modes and new scaffolds. Here, we propose AutoLeadDesign, a lead compounds design framework that inspires extensive domain knowledge encoded in large language models with chemical fragments to progressively implement efficient exploration of vast chemical space. The comprehensive experiments indicate that AutoLeadDesign outperforms baseline methods. Significantly, empirical lead design campaigns targeting two clinically relevant targets (PRMT5 and SARS-CoV-2 PLpro) demonstrate AutoLeadDesign's competence in de novo generation of lead compounds, achieving expert-competitive design efficacy. Structural analysis further confirms their mechanism-validated inhibitory patterns. By tracing the process of design, we find that AutoLeadDesign shares analogous mechanisms with fragment-based drug design, which traditionally rely on expert decision-making, further revealing why it works. Overall, AutoLeadDesign offers an efficient approach for lead compound design, suggesting its potential utility in drug design.


UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design

Kong, Xiangzhe, Zhang, Zishen, Zhang, Ziting, Jiao, Rui, Ma, Jianzhu, Liu, Kai, Huang, Wenbing, Liu, Yang

arXiv.org Artificial Intelligence

The design of target-specific molecules such as small molecules, peptides, and antibodies is vital for biological research and drug discovery. Existing generative methods are restricted to single-domain molecules, failing to address versatile therapeutic needs or utilize cross-domain transferability to enhance model performance. In this paper, we introduce Unified generative Modeling of 3D Molecules (UniMoMo), the first framework capable of designing binders of multiple molecular domains using a single model. In particular, UniMoMo unifies the representations of different molecules as graphs of blocks, where each block corresponds to either a standard amino acid or a molecular fragment. Based on these unified representations, UniMoMo utilizes a geometric latent diffusion model for 3D molecular generation, featuring an iterative full-atom autoencoder to compress blocks into latent space points, followed by an E(3)-equivariant diffusion process. Extensive benchmarks across peptides, antibodies, and small molecules demonstrate the superiority of our unified framework over existing domain-specific models, highlighting the benefits of multi-domain training.